Multi-document Summarization Based on Atomic Semantic Events and Their Temporal Relationships
نویسندگان
چکیده
Automatic multi-document summarization (MDS) is the process of extracting the most important information such as events and entities from multiple natural language texts focused on the same topic. We extract all types of semantic atomic information and feed them to a topic model to experiment with their effects on a summary. We design a coherent summarization system by taking into account the sentence relative positions in the original text. Our generic MDS system has outperformed the best recent multi-document summarization system in DUC 2004 in terms of ROUGE-1 recall and f1-measure. Our query-focused summarization system achieves a statistically similar result to the state-ofthe-art unsupervised system for DUC 2007 query-focused MDS task in ROUGE-2 recall measure. Update Summarization is a new form of MDS where novel yet salience sentences are chosen as summary sentences based on the assumption that the user has already read a given set of documents. In this thesis, we present an event based update summarization where the novelty is detected based on the temporal ordering of events and the saliency is ensured by event and entity distribution. To our knowledge, no other study has deeply investigated the effects of the novelty information acquired from the temporal ordering of events (assuming that a sentence contains one or more events) in the domain of update MDS. Our update MDS system has outperformed the state-of-the-art update MDS system in terms of ROUGE-2, and ROUGE-SU4 recall measures. Our MDS systems also generate quality summaries which are manually evaluated based on popular evaluation criteria.
منابع مشابه
Exploiting aspectual features and connecting words for summarization-inspired temporal-relation extraction
This paper presents a model that incorporates contemporary theories of tense and aspect and develops a new framework for extracting temporal relations between two sentence-internal events, given their tense, aspect, and a temporal connecting word relating the two events. A linguistic constraint on event combination has been implemented to detect incorrect parser analyses and potentially apply s...
متن کاملTemporal multi-page summarization
With the increasing popularity of the Web, efficient approaches to the information overload are becoming more necessary. Summarization of web pages aims at detecting the most important contents from pages so that a user can obtain a compact version of a web document or a group of pages. Traditionally, summaries are constructed on static snapshots of web pages. However, web pages are dynamic obj...
متن کاملImproved Affinity Graph Based Multi-Document Summarization
This paper describes an affinity graph based approach to multi-document summarization. We incorporate a diffusion process to acquire semantic relationships between sentences, and then compute information richness of sentences by a graph rank algorithm on differentiated intra-document links and inter-document links between sentences. A greedy algorithm is employed to impose diversity penalty on ...
متن کاملMulti-Topic Multi-Document Summarization
Summarization of multiple documents featuring multiple topics is discussed. The example trea.ted here consists of fifty articles about the Peru hostage incident tbr ])ecember 1996 through April 1997. They include a. lot of topics such as opening, negotiation, ending, and so on. The method proposed in this paper is based on spreading activation over documents syntactically and semantically annot...
متن کاملReducing Redundancy in Multi-document Summarization Using Lexical Semantic Similarity
We present an automatic multi-document summarization system for Dutch based on the MEAD system. We focus on redundancy detection, an essential ingredient of multi-document summarization. We introduce a semantic overlap detection tool, which goes beyond simple string matching. Our results so far do not confirm our expectation that this tool would outperform the other tested methods.
متن کامل